Automatic Classification of Cancer Notifiable Death Certificates
نویسندگان
چکیده
The timely notification of cancer cases is crucial for cancer monitoring and prevention. However, the abstraction and classification of cancer from the free-text of pathology reports and other relevant documents, such as death certificates, are complex and time-consuming activities. In this paper we investigate approaches for the automatic detection of cases where the cause of death is a notifiable cancer from free-text death certificates supplied to Cancer Registries. A number of machine learning classifiers were investigated. A large set of features were also extracted using natural language techniques and the Medtex toolkit; features include stemmed words, bi-grams, and concepts from the SNOMED CT medical terminology. The investigated approaches were found to be very effective in identifying death certificates where the cause of death was a notifiable cancer. Best performance was achieved by a Support Vector Machine (SVM) classifier with an overall F-measure of 0.9647 when evaluated on a set of 5,000 free-text death certificates. This classifier considers as features stemmed token bigrams and information from SNOMED CT concepts filtered by morphological abnormalities and disorders. However, our analysis shows that it is the selection of features that most influences the performance of the classifiers rather than the type of classifier or the feature weighting schema. Specifically, we found that stemmed token bigrams with or without SNOMED CT concepts are the most effective feature. In addition, the combination of token bigrams and SNOMED CT information was found to yield the best overall performance.
منابع مشابه
Classification of cancer-related death certificates using machine learning.
BACKGROUND Cancer monitoring and prevention relies on the critical aspect of timely notification of cancer cases. However, the abstraction and classification of cancer from the free-text of pathology reports and other relevant documents, such as death certificates, exist as complex and time-consuming activities. AIMS In this paper, approaches for the automatic detection of notifiable cancer c...
متن کاملSurveillance for Cancer Incidence and Mortality - United States, 2012.
This report provides, in tabular and graphic form, official federal statistics on the occurrence of cancer for 2012 and trends for 1999-2012 as reported by CDC and the National Cancer Institute (NCI) (1). Cancer incidence data are from population-based cancer registries that participate in CDC's National Program of Cancer Registries (NPCR) and NCI's Surveillance, Epidemiology, and End Results (...
متن کاملQuality comparison of electronic versus paper death certificates in France, 2010
BACKGROUND Electronic death certification was established in France in 2007. A methodology based on intrinsic characteristics of death certificates was designed to compare the quality of electronic versus paper death certificates. METHODS All death certificates from the 2010 French mortality database were included. Three specific quality indicators were considered: (i) amount of information, ...
متن کاملDoes quality control of death certificates in hospitals have an impact on cause of death statistics?
BACKGROUND The effects of inaccurate death certificates on cause of death statistics are uncertain. Since 2008, Akershus University Hospital has systematically corrected all death certificates. The effects of these corrections on the total cause of death statistics from the hospital were studied. MATERIAL AND METHOD ICD-10 codes for the underlying cause of death on the original and the correc...
متن کاملAutomatic classification of diseases from free-text death certificates for real-time surveillance
BACKGROUND Death certificates provide an invaluable source for mortality statistics which can be used for surveillance and early warnings of increases in disease activity and to support the development and monitoring of prevention or response strategies. However, their value can be realised only if accurate, quantitative data can be extracted from death certificates, an aim hampered by both the...
متن کامل